Content
- bar plot
- boxplot
- heatmap
- network
In this workshop, we will do several steps to visualize metagenomics data using R. we will start with data generated in previous session, followed by abundance quantification and statistical analysis.
a. Bar plot illustrating the clinical covariates that are associated with variation in the neonatal gut microbiota on day 4 (n = 310 individuals), day 7 (n = 532 individuals), day 21 (n = 325 individuals) and in infancy (n = 302 individuals).b, Longitudinal changes in the mean relative abundance of genera of faecal bacteria, sampled on day 4, day 7, day 21 and in infancy, for genera with >1% mean relative abundance across all samples from the neonatal period.
Analysis of host attachment and growth rates of CPR/DPANN organisms.)
Heat map representing differentially abundant bacterial species (false discovery rate < 0.01) between HSPC and CRPC.
library(phyloseq)
library(ggplot2)
library(tidyverse)
library(ggpubr)
library(ggsci)
library(stringr)
library(dutchmasters)
load('phyloseq_object_data_2.Rdata')
phylo_obj_2@sam_data[1:3,1:3]
## SampleID Group Condition
## SRR16348844 SRR16348844 BP-2 Blautia-HFD
## SRR16348845 SRR16348845 BP-1 Blautia-HFD
## SRR16348846 SRR16348846 Norm-8 Control-Chow
phylo_obj_2@otu_table[1:3,1:3]
## OTU Table: [3 taxa and 3 samples]
## taxa are rows
## SRR16348844 SRR16348845 SRR16348846
## OTU 1 511626 1230 1782021
## OTU 2 69651 30157 0
## OTU 3 80033 150153 0
phylo_obj_2@tax_table[1:3,1:3]
## Taxonomy Table: [3 taxa by 3 taxonomic ranks]:
## Kingdom Phylum Class
## OTU 1 "k__Bacteria" "p__Verrucomicrobia" "c__Verrucomicrobiae"
## OTU 2 "k__Bacteria" "p__Proteobacteria" "c__Epsilonproteobacteria"
## OTU 3 "k__Bacteria" "p__Firmicutes" "c__Erysipelotrichia"
pd <- psmelt(phylo_obj_2)
pd[1:7,1:7]
## OTU Sample Abundance SampleID Group Condition Kingdom
## 2 OTU 1 SRR16348866 2917737 SRR16348866 Norm-2 Control-Chow k__Bacteria
## 18 OTU 1 SRR16348846 1782021 SRR16348846 Norm-8 Control-Chow k__Bacteria
## 3 OTU 1 SRR16348855 1577778 SRR16348855 Norm-3 Control-Chow k__Bacteria
## 1 OTU 1 SRR16348844 511626 SRR16348844 BP-2 Blautia-HFD k__Bacteria
## 21 OTU 1 SRR16348849 492468 SRR16348849 Norm-5 Control-Chow k__Bacteria
## 35 OTU 10 SRR16348857 401402 SRR16348857 HFD-3 Control-HFD k__Bacteria
## 1129 OTU 58 SRR16348852 392905 SRR16348852 HFD-7 Control-HFD k__Bacteria
pd = pd %>% mutate(Kingdom = str_replace(Kingdom, 'k__',''),
Phylum = str_replace(Phylum, 'p__',''),
Class = str_replace(Class, 'c__',''),
Order = str_replace(Order, 'o__',''),
Family = str_replace(Family, 'f__',''),
Genus = str_replace(Genus, 'g__',''),
Species = str_replace(Species, 's__',''))
colnames(pd)
## [1] "OTU" "Sample" "Abundance" "SampleID" "Group" "Condition"
## [7] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus"
## [13] "Species"
In R, we use ggplot2, learn this package - Tutorial - For bar plot, we will use geom_bar(), use ‘?geom_bar’ in Rstudio to check the usage of this function - geom_bar() makes the height of the bar proportional to the number of cases in each group
# create a basic bar plot
# The abundance of species in each sample
ggplot(data = pd, aes(x = SampleID, y = Abundance)) +
geom_bar(stat = 'identity')
ggplot(data = pd, aes(x = SampleID, y = Abundance)) +
theme_bw() +
geom_bar(stat = 'identity') +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = pd, aes(x = SampleID, y = Abundance)) +
geom_bar(stat = 'identity', aes(fill = Condition)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = pd, aes(x = SampleID, y = Abundance)) +
geom_bar(stat = 'identity', aes(fill = Condition),width = 0.2) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = pd, aes(x = SampleID, y = Abundance)) +
geom_bar(stat = 'identity', aes(fill = Condition),width = 0.8) +
#scale_fill_manual(values = c("blue","red","green")) +
scale_fill_manual(values = c("#b3e2cd","#fdcdac","#cbd5e8")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data = pd, aes(x = Class, y = Abundance)) +
geom_bar(stat = 'identity', aes(fill = Condition),width = 0.8) +
#scale_fill_manual(values = c("blue","red","green")) +
scale_fill_manual(values = c("#b3e2cd","#fdcdac","#cbd5e8")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#length(unique(pd$Phylum))
#6
ggplot(data = pd, aes(x = Condition, y = Abundance)) +
geom_bar(stat = 'identity', position = "fill", aes(fill = Phylum),width = 0.8) +
scale_fill_jco()
#length(unique(pd$Class))
#13
ggplot(data = pd, aes(x = Condition, y = Abundance)) +
geom_bar(stat = 'identity', position = "fill", aes(fill = Class),width = 0.8) +
scale_fill_dutchmasters(palette = "milkmaid")
ggplot(data = pd, aes(x = Class, y = Abundance)) +
geom_bar(stat = 'identity', position = "fill", aes(fill = Condition),width = 0.8) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_jco()
ggplot(data = pd, aes(x = Class, y = Abundance)) +
geom_bar(stat = 'identity', position = "fill", aes(fill = Condition),width = 0.8) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_jco() +
labs(x = "Group", y = "Relative abundance (%)")
ggplot(data = pd, aes(x = Class, y = Abundance)) +
geom_bar(stat = 'identity', position = "fill", aes(fill = Condition),width = 0.8) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_jco() +
labs(x = "Group", y = "Relative abundance (%)")
ggplot(data = pd, aes(x = Condition, y = Abundance)) +
geom_bar(stat = 'identity', position = "fill", aes(fill =Phylum ),width = 0.8) +
theme_classic() +
theme(legend.position = 'top') +
scale_fill_jco() +
labs(x = "Group", y = "Relative abundance (%)")
fig <- ggplot(data = pd, aes(x = Condition, y = Abundance)) +
geom_bar(stat = 'identity', position = "stack", aes(fill =Phylum ),width = 0.8) +
theme_classic() +
theme(legend.position = 'top') +
scale_fill_jco() +
labs(x = "Group", y = "Relative abundance (%)")
ggsave(fig, filename = 'Con2Abundance.pdf', width = 5, height = 4)
load('Data2_Abundance_metadata.Rdata')
head(Metadata)
## SampleID Group Condition Observed Chao1 se.chao1 ACE se.ACE Shannon
## 1 SRR16348844 BP-2 Blautia-HFD 36 36 0 NaN NaN 1.90764074
## 2 SRR16348845 BP-1 Blautia-HFD 30 30 0 NaN NaN 2.26149321
## 3 SRR16348846 Norm-8 Control-Chow 9 9 0 NaN NaN 0.08668567
## 4 SRR16348849 Norm-5 Control-Chow 20 20 0 NaN NaN 1.96953105
## 5 SRR16348850 Norm-4 Control-Chow 21 21 0 NaN NaN 1.86096898
## 6 SRR16348851 HFD-8 Control-HFD 21 21 0 NaN NaN 2.05877370
## Simpson InvSimpson
## 1 0.69315682 3.258994
## 2 0.84753991 6.559094
## 3 0.02409417 1.024689
## 4 0.77738789 4.492119
## 5 0.79825707 4.956803
## 6 0.80557316 5.143323
ggplot(data = Metadata, aes(x = Condition, y = Shannon)) +
geom_boxplot(aes(fill = Condition))
my_comparisons = list(c("Blautia-HFD","Control-HFD"),c("Control-HFD","Control-Chow",""))
ggplot(data = Metadata, aes(x = Condition, y = Shannon)) +
geom_boxplot(aes(fill = Condition)) +
scale_fill_jco() +
stat_compare_means(comparisons = my_comparisons, method = "wilcox.test",aes(as_label(..p.format..))) # ..p.signif..
plot_heatmap(phylo_obj_2, method = "NMDS", distance = "bray")
## Warning: Transformation introduced infinite values in discrete y-axis
total = median(sample_sums(phylo_obj_2))
abundant_OTU <- filter_taxa(phylo_obj_2, function(x) sum(x > total*0.20) > 0, TRUE)
abundant_OTU
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12 taxa and 21 samples ]
## sample_data() Sample Data: [ 21 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12 taxa by 7 taxonomic ranks ]
plot_heatmap(abundant_OTU, method = "NMDS", distance = "bray")
## Warning: Transformation introduced infinite values in discrete y-axis
- you can also convert ‘abundant_OTU’ object to a read friendly table.
pd_abundant = psmelt(abundant_OTU)
head(pd_abundant)
## OTU Sample Abundance SampleID Group Condition Kingdom
## 15 OTU 1 SRR16348866 2917737 SRR16348866 Norm-2 Control-Chow k__Bacteria
## 12 OTU 1 SRR16348846 1782021 SRR16348846 Norm-8 Control-Chow k__Bacteria
## 3 OTU 1 SRR16348855 1577778 SRR16348855 Norm-3 Control-Chow k__Bacteria
## 1 OTU 1 SRR16348844 511626 SRR16348844 BP-2 Blautia-HFD k__Bacteria
## 11 OTU 1 SRR16348849 492468 SRR16348849 Norm-5 Control-Chow k__Bacteria
## 34 OTU 10 SRR16348857 401402 SRR16348857 HFD-3 Control-HFD k__Bacteria
## Phylum Class Order
## 15 p__Verrucomicrobia c__Verrucomicrobiae o__Verrucomicrobiales
## 12 p__Verrucomicrobia c__Verrucomicrobiae o__Verrucomicrobiales
## 3 p__Verrucomicrobia c__Verrucomicrobiae o__Verrucomicrobiales
## 1 p__Verrucomicrobia c__Verrucomicrobiae o__Verrucomicrobiales
## 11 p__Verrucomicrobia c__Verrucomicrobiae o__Verrucomicrobiales
## 34 p__Firmicutes c__Bacilli o__Lactobacillales
## Family Genus Species
## 15 f__Akkermansiaceae g__Akkermansia s__Akkermansia_muciniphila
## 12 f__Akkermansiaceae g__Akkermansia s__Akkermansia_muciniphila
## 3 f__Akkermansiaceae g__Akkermansia s__Akkermansia_muciniphila
## 1 f__Akkermansiaceae g__Akkermansia s__Akkermansia_muciniphila
## 11 f__Akkermansiaceae g__Akkermansia s__Akkermansia_muciniphila
## 34 f__Lactobacillaceae g__Lactobacillus s__Lactobacillus_johnsonii
plot_heatmap(abundant_OTU, method = "MDS", distance = "(A+B-2*J)/(A+B-J)",
taxa.label = "Order", taxa.order = "Order",
trans=NULL, low="beige", high="red", na.value="beige")
ggplot(data = pd_abundant, aes(x=SampleID, y=Class, fill = Abundance)) +
geom_tile() +
scale_fill_continuous(low = 'beige', high = 'red') +
theme(axis.text.x = element_text(angle = 270, vjust = 0.5))
plot_net(phylo_obj_2, distance = "(A+B-2*J)/(A+B)", type = "taxa",
maxdist = 0.7, color="Class", point_label="Genus")
-End, hopefully you enjoyed this session!!